Aims of this practical
- Getting started with handling textual data in R
- Basic steps in data cleaning
- Calculating text metrics
- Replicating Zipf’s Law
Task 1: Corpus operations
Before we work with text data in a more advanced manner, we will
first start by using datasets that are contained in R and then move to
loading external data sets on which we conduct text-based
operations.
R - and quanteda specifically - contain numerous
“built-in” datasets. You can find these under https://quanteda.io/reference/index.html#section-data.
By loading the quanteda package, these datasets are
available in your workspace and can be accessed.
# we use the dataset of inaugural speeches by US presidents as the first example
data_corpus_inaugural
Corpus consisting of 59 documents and 4 docvars.
1789-Washington :
"Fellow-Citizens of the Senate and of the House of Representa..."
1793-Washington :
"Fellow citizens, I am again called upon by the voice of my c..."
1797-Adams :
"When it was first perceived, in early times, that no middle ..."
1801-Jefferson :
"Friends and Fellow Citizens: Called upon to undertake the du..."
1805-Jefferson :
"Proceeding, fellow citizens, to that qualification which the..."
1809-Madison :
"Unwilling to depart from examples of the most revered author..."
[ reached max_ndoc ... 53 more documents ]
To access the individual texts, you can simply index the object:
us_speeches[1]
Corpus consisting of 1 document and 4 docvars.
1789-Washington :
"Fellow-Citizens of the Senate and of the House of Representa..."
Note that this corpus object also contains dovcars
(document-level variables). These are essential for later analyses and
classification tasks. We can see what these variables are as
follows:
See more on document variables - including how you can assign them
(useful for later steps) here: https://tutorials.quanteda.io/basic-operations/corpus/docvars/.
For now, it suffices to access the docvars in the usual
form:
us_speeches$Year
[1] 1789 1793 1797 1801 1805 1809 1813 1817 1821 1825 1829 1833 1837 1841 1845
[16] 1849 1853 1857 1861 1865 1869 1873 1877 1881 1885 1889 1893 1897 1901 1905
[31] 1909 1913 1917 1921 1925 1929 1933 1937 1941 1945 1949 1953 1957 1961 1965
[46] 1969 1973 1977 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017 2021
Lastly, each document’s name can be accessed as:
docnames(us_speeches)
[1] "1789-Washington" "1793-Washington" "1797-Adams" "1801-Jefferson"
[5] "1805-Jefferson" "1809-Madison" "1813-Madison" "1817-Monroe"
[9] "1821-Monroe" "1825-Adams" "1829-Jackson" "1833-Jackson"
[13] "1837-VanBuren" "1841-Harrison" "1845-Polk" "1849-Taylor"
[17] "1853-Pierce" "1857-Buchanan" "1861-Lincoln" "1865-Lincoln"
[21] "1869-Grant" "1873-Grant" "1877-Hayes" "1881-Garfield"
[25] "1885-Cleveland" "1889-Harrison" "1893-Cleveland" "1897-McKinley"
[29] "1901-McKinley" "1905-Roosevelt" "1909-Taft" "1913-Wilson"
[33] "1917-Wilson" "1921-Harding" "1925-Coolidge" "1929-Hoover"
[37] "1933-Roosevelt" "1937-Roosevelt" "1941-Roosevelt" "1945-Roosevelt"
[41] "1949-Truman" "1953-Eisenhower" "1957-Eisenhower" "1961-Kennedy"
[45] "1965-Johnson" "1969-Nixon" "1973-Nixon" "1977-Carter"
[49] "1981-Reagan" "1985-Reagan" "1989-Bush" "1993-Clinton"
[53] "1997-Clinton" "2001-Bush" "2005-Bush" "2009-Obama"
[57] "2013-Obama" "2017-Trump" "2021-Biden"
Exercise 1.1
Which speech has the highest number of characters per word?
And which one the lowest?
Hint: try to work with the native data.frame structure or with a
data.table. This will require a conversion from the corpus
object.
Exercise 1.2
Which speech contained the most punctuation?
Exercise 1.3
How has the average sentence length changes over
time?

Task 2: First steps with real datasets
Use the data of statements on truthful and deceptive weekend plans
that was the basis for this
paper. You can find the raw textual data on the OSF: https://osf.io/rtq9y.
The participants were asked to either tell the truth about their
plans for the upcoming weekend or were assigned an activity from someone
else and had to lie about it (i.e., fabricate a story).
Each participant was asked two provide two statements (1. Please
write about your weekend plans in as much detail as possible.; 2. Which
information could prove that you are telling the truth?). Focus on the
first question (called q1 in the dataset).
The variable outcome_class is either t
(truthful) or d (deceptive).
Exercise 2.1
What is the effect size (Cohen’s d) for the difference in
words per sentence between truthful and deceptive
statements?
Task 3: Replicating Zipf’s Law
A curious “law” in corpus linguistics is Zipf’s Law (YouTube
here).
Zipf’s Law describes the relationship between the frequency of words
in a language and their rank in a frequency-sorted list: the frequency
of any word is inversely proportional to its rank in the frequency
table.
Key aspects of Zipf’s Law:
- Word frequency distribution: In a large enough collection of texts,
the most common word occurs about twice as often as the second most
frequent word, three times as often as the third most frequent word,
etc.
- Mathematical formulation: The law can be expressed as \(f(w) \approx \frac{1}{r}\), where \(f(w)\) is the frequency of word \(w\) and r is the rank of the word.
- Universality: This distribution is observed across various
languages, including children’s speech and specialized
vocabularies.
Exercise 3.1
The dataset we will use for this exercise stems from work a paper on analysing narrative
shapes in YouTube vlog transcripts. In that paper, the video
transcripts of 30k vlogs were analysed. The dataset can be loaded as
follows:
Does Zipf’s Law apply to a corpus of YouTube vlog
transcripts?
Hint: you will need to obtain the most common words for this
analysis from that corpus. Have a look at the topfeatures()
function. Here, put your tokenised object into a dfm
(we will learn more about the dfm in the next part).

Exercise 3.2
How do the word frequency ranks in the vlogs corpus deviate
from Google’s 1 Trillion Word Corpus frequency ranks?
You can find a ranked list of word frequencies from from Google’s
Trillion Word Corpus at: https://github.com/first20hours/google-10000-english.
It is also provided in the data directory of this repo
(./data/google_10k_list.txt). These data are already in
ranked order; the file does not contain a header (so set:
header=F).
LS0tCnRpdGxlOiAnUHJhY3RpY2FsIEkgKHNvbHV0aW9ucyknCnN1YnRpdGxlOiBCZW5uZXR0IEtsZWluYmVyZwpkYXRlOiAnU3RhdGlzdGljYWwgTmF0dXJhbCBMYW5ndWFnZSBQcm9jZXNzaW5nIGluIFInCm91dHB1dDoKICBodG1sX2RvY3VtZW50OgogICAgdG9jOiB5ZXMKICAgIGRmX3ByaW50OiBwYWdlZAogICAgY29kZV9mb2xkaW5nOiBzaG93CiAgaHRtbF9ub3RlYm9vazoKICAgIHRoZW1lOiB1bml0ZWQKICAgIHRvYzogeWVzCiAgICBjb2RlX2ZvbGRpbmc6IHNob3cKICBwZGZfZG9jdW1lbnQ6CiAgICB0b2M6IHllcwogICAgY29kZV9mb2xkaW5nOiBzaG93Ci0tLQoKIyMgQWltcyBvZiB0aGlzIHByYWN0aWNhbAoKLSBHZXR0aW5nIHN0YXJ0ZWQgd2l0aCBoYW5kbGluZyB0ZXh0dWFsIGRhdGEgaW4gUgotIEJhc2ljIHN0ZXBzIGluIGRhdGEgY2xlYW5pbmcKLSBDYWxjdWxhdGluZyB0ZXh0IG1ldHJpY3MKLSBSZXBsaWNhdGluZyBaaXBmJ3MgTGF3CgoKIyMgVGFzayAxOiBDb3JwdXMgb3BlcmF0aW9ucwoKQmVmb3JlIHdlIHdvcmsgd2l0aCB0ZXh0IGRhdGEgaW4gYSBtb3JlIGFkdmFuY2VkIG1hbm5lciwgd2Ugd2lsbCBmaXJzdCBzdGFydCBieSB1c2luZyBkYXRhc2V0cyB0aGF0IGFyZSBjb250YWluZWQgaW4gUiBhbmQgdGhlbiBtb3ZlIHRvIGxvYWRpbmcgZXh0ZXJuYWwgZGF0YSBzZXRzIG9uIHdoaWNoIHdlIGNvbmR1Y3QgdGV4dC1iYXNlZCBvcGVyYXRpb25zLiAKClIgLSBhbmQgYHF1YW50ZWRhYCBzcGVjaWZpY2FsbHkgLSBjb250YWluIG51bWVyb3VzICJidWlsdC1pbiIgZGF0YXNldHMuIFlvdSBjYW4gZmluZCB0aGVzZSB1bmRlciBbaHR0cHM6Ly9xdWFudGVkYS5pby9yZWZlcmVuY2UvaW5kZXguaHRtbCNzZWN0aW9uLWRhdGFdKGh0dHBzOi8vcXVhbnRlZGEuaW8vcmVmZXJlbmNlL2luZGV4Lmh0bWwjc2VjdGlvbi1kYXRhKS4KCkJ5IGxvYWRpbmcgdGhlIGBxdWFudGVkYWAgcGFja2FnZSwgdGhlc2UgZGF0YXNldHMgYXJlIGF2YWlsYWJsZSBpbiB5b3VyIHdvcmtzcGFjZSBhbmQgY2FuIGJlIGFjY2Vzc2VkLgoKYGBge3J9CmxpYnJhcnkocXVhbnRlZGEpCmxpYnJhcnkoZGF0YS50YWJsZSkKCiMgd2UgdXNlIHRoZSBkYXRhc2V0IG9mIGluYXVndXJhbCBzcGVlY2hlcyBieSBVUyBwcmVzaWRlbnRzIGFzIHRoZSBmaXJzdCBleGFtcGxlCmRhdGFfY29ycHVzX2luYXVndXJhbApgYGAKCgpUbyBhY2Nlc3MgdGhlIGluZGl2aWR1YWwgdGV4dHMsIHlvdSBjYW4gc2ltcGx5IGluZGV4IHRoZSBvYmplY3Q6IAoKCmBgYHtyfQp1c19zcGVlY2hlcyA9IGRhdGFfY29ycHVzX2luYXVndXJhbAp1c19zcGVlY2hlc1sxXQoKYGBgCgpOb3RlIHRoYXQgdGhpcyBjb3JwdXMgb2JqZWN0IGFsc28gY29udGFpbnMgYGRvdmNhcnNgIChkb2N1bWVudC1sZXZlbCB2YXJpYWJsZXMpLiBUaGVzZSBhcmUgZXNzZW50aWFsIGZvciBsYXRlciBhbmFseXNlcyBhbmQgY2xhc3NpZmljYXRpb24gdGFza3MuIFdlIGNhbiBzZWUgd2hhdCB0aGVzZSB2YXJpYWJsZXMgYXJlIGFzIGZvbGxvd3M6CgpgYGB7cn0KZG9jdmFycyh1c19zcGVlY2hlcykKYGBgCgpTZWUgbW9yZSBvbiBkb2N1bWVudCB2YXJpYWJsZXMgLSBpbmNsdWRpbmcgaG93IHlvdSBjYW4gYXNzaWduIHRoZW0gKHVzZWZ1bCBmb3IgbGF0ZXIgc3RlcHMpIGhlcmU6IFtodHRwczovL3R1dG9yaWFscy5xdWFudGVkYS5pby9iYXNpYy1vcGVyYXRpb25zL2NvcnB1cy9kb2N2YXJzL10oaHR0cHM6Ly90dXRvcmlhbHMucXVhbnRlZGEuaW8vYmFzaWMtb3BlcmF0aW9ucy9jb3JwdXMvZG9jdmFycy8pLiBGb3Igbm93LCBpdCBzdWZmaWNlcyB0byBhY2Nlc3MgdGhlIGBkb2N2YXJzYCBpbiB0aGUgdXN1YWwgZm9ybToKCmBgYHtyfQp1c19zcGVlY2hlcyRZZWFyCmBgYAoKTGFzdGx5LCBlYWNoIGRvY3VtZW50J3MgbmFtZSBjYW4gYmUgYWNjZXNzZWQgYXM6CgpgYGB7cn0KZG9jbmFtZXModXNfc3BlZWNoZXMpCmBgYAoKCiMjIyBFeGVyY2lzZSAxLjEgCgoqKldoaWNoIHNwZWVjaCBoYXMgdGhlIGhpZ2hlc3QgbnVtYmVyIG9mIGNoYXJhY3RlcnMgcGVyIHdvcmQ/IEFuZCB3aGljaCBvbmUgdGhlIGxvd2VzdD8qKgoKX0hpbnQ6IHRyeSB0byB3b3JrIHdpdGggdGhlIG5hdGl2ZSBkYXRhLmZyYW1lIHN0cnVjdHVyZSBvciB3aXRoIGEgZGF0YS50YWJsZS4gVGhpcyB3aWxsIHJlcXVpcmUgYSBjb252ZXJzaW9uIGZyb20gdGhlIGNvcnB1cyBvYmplY3QuXwoKYGBge3J9CiMgT3B0aW9uIDE6IHVzZSB0aGUgbmF0aXZlIGRhdGEuZnJhbWUgc3RydWN0dXJlIChwcmVmZXJhYmx5OiBkYXRhLnRhYmxlIGZvciBzaWduaWZpY2FudGx5IGZhc3RlciBwcm9jZXNzaW5nIGZvciBsYXJnZXIgY29ycG9yYSAtIHNlZSBiZWxvdykKCiMjICBjb252ZXJ0IHRoZSBjb3JwdXMgdG8gYSBkYXRhLmZyYW1lCnVzX2NvcnB1c19kZiA9IGNvbnZlcnQodXNfc3BlZWNoZXMsIHRvID0gJ2RhdGEuZnJhbWUnKQpuYW1lcyh1c19jb3JwdXNfZGYpCgojIyBhZGQgY29sdW1ucyB0byB0aGUgZGF0YS5mcmFtZQp1c19jb3JwdXNfZGYkbmNoYXJzID0gbmNoYXIodXNfY29ycHVzX2RmJHRleHQpCnVzX2NvcnB1c19kZiRudG9rcyA9IG50b2tlbihxdWFudGVkYTo6dG9rZW5zKHVzX2NvcnB1c19kZiR0ZXh0LCB3aGF0ID0gJ3dvcmQnKSkgI25vdGUgdGhlIGZvcmNpbmcgdG8gdXNlIHF1YW50ZWRhJ3MgdG9rZW5zIGZ1bmN0aW9uOyB0aGlzIGlzIGR1ZSB0byBvdGhlciBwYWNrYWdlcyBhbHNvIGNvbnRhaW5pbmcgdGhpcyBmdW5jdGlvbgoKIyMgY3JlYXRlIHRoZSB2YXJpYWJsZSBvZiBjaGFyYWN0ZXJzIHBlciB3b3JkCnVzX2NvcnB1c19kZiRjcHcgPSB1c19jb3JwdXNfZGYkbmNoYXJzL3VzX2NvcnB1c19kZiRudG9rcwoKIyMgbGFzdGx5OiBmaW5kIHRoZSBoaWdoZXN0IGFuZCBsb3dlc3QKdXNfY29ycHVzX2RmW3doaWNoLm1heCh1c19jb3JwdXNfZGYkY3B3KSwgXQp1c19jb3JwdXNfZGZbd2hpY2gubWluKHVzX2NvcnB1c19kZiRjcHcpLCBdCgoKIyBPcHRpb24gMjogdXNpbmcgZGF0YS50YWJsZSBmb3IgdGhlIHN0ZXBzIGFib3ZlIAojIyBUaGUgc3ludGF4IGlzIHNvbWV3aGF0IGRpZmZlcmVudCBidXQgc3Ryb25nbHkgcmVjb21tZW5kIGZvciBmYXN0ZXIgcHJvY2Vzc2luZy4KCnVzX2NvcnB1c19kdCA9IHNldERUKGNvbnZlcnQodXNfc3BlZWNoZXMsIHRvID0gJ2RhdGEuZnJhbWUnKSkgIyBUaGUgdHJpY2sgaGVyZSBpcyB0byBnbyB2aWEgYSBkZiBmaXJzdCBhbmQgdGhlbiBzZXQgdGhlIGRhdGEudGFibGUKCnVzX2NvcnB1c19kdFssIGNwdyA6PSBuY2hhcih0ZXh0KS9udG9rZW4ocXVhbnRlZGE6OnRva2Vucyh0ZXh0KSldCgp1c19jb3JwdXNfZHRbd2hpY2gubWF4KGNwdyksIF0KdXNfY29ycHVzX2R0W3doaWNoLm1pbihjcHcpLCBdCmBgYAoKCiMjIyBFeGVyY2lzZSAxLjIgCgoqKldoaWNoIHNwZWVjaCBjb250YWluZWQgdGhlIG1vc3QgcHVuY3R1YXRpb24/KioKCmBgYHtyfQojIFVzaW5nIGRhdGEudGFibGUKIyMgVXNpbmcgdGhlIHRva2Vuc19zZWxlY3QoKSBmdW5jdGlvbiBhbmQgZGVmaW5pbmcgYSBwdW5jdHVhdGlvbiByZWdleCB0byBrZWVwIHRoZSBzZWxlY3Rpb24gb2YgYWxsIHB1bmN0dWF0aW9uLCB0aGVuIGNvdW50IHdoYXQgaXMgbGVmdCBvdmVyCnVzX2NvcnB1c19kdFssIG5fcHVuY3QgOj0gbnRva2VuKHRva2Vuc19zZWxlY3QocXVhbnRlZGE6OnRva2Vucyh0ZXh0KQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICwgcGF0dGVybiA9ICJbWzpwdW5jdDpdXSIKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAsIHZhbHVldHlwZSA9ICJyZWdleCIKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAsIHNlbGVjdGlvbiA9ICdrZWVwJykpXQoKIyBub3RlOiB0aGlzIGlzIGVxdWl2YWxlbnQgdG8gdGhpcyBkYXRhLmZyYW1lIG5vdGF0aW9uCnVzX2NvcnB1c19kZiRuX3B1bmN0ID0gbnRva2VuKHRva2Vuc19zZWxlY3QocXVhbnRlZGE6OnRva2Vucyh1c19jb3JwdXNfZGYkdGV4dCkKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAsIHBhdHRlcm4gPSAiW1s6cHVuY3Q6XV0iCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLCB2YWx1ZXR5cGUgPSAicmVnZXgiCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLCBzZWxlY3Rpb24gPSAna2VlcCcpKQoKCiMjIFdlIG1heSB3YW50IHRvIHN0YW5kYXJkaXNlIHRoaXMgYnkgdGhlIGxlbmd0aCBvZiB0aGUgc3BlZWNoCnVzX2NvcnB1c19kdFssIHByb3BfcHVuY3QgOj0gbl9wdW5jdC9udG9rZW4ocXVhbnRlZGE6OnRva2Vucyh0ZXh0KSldCgp1c19jb3JwdXNfZHRbd2hpY2gubWF4KG5fcHVuY3QpLCBdCnVzX2NvcnB1c19kdFt3aGljaC5tYXgocHJvcF9wdW5jdCksIF0KCmBgYAoKIyMjIEV4ZXJjaXNlIDEuMyAKCioqSG93IGhhcyB0aGUgYXZlcmFnZSBzZW50ZW5jZSBsZW5ndGggY2hhbmdlcyBvdmVyIHRpbWU/KioKCmBgYHtyfQojIGNhbGN1bGF0ZSB3b3JkcyBwZXIgc2VudGVuY2UKdXNfY29ycHVzX2R0WywgbnRva3MgOj0gbnRva2VuKHF1YW50ZWRhOjp0b2tlbnModGV4dCwgd2hhdCA9ICd3b3JkJykpXQp1c19jb3JwdXNfZHRbLCBuc2VudCA6PSBudG9rZW4ocXVhbnRlZGE6OnRva2Vucyh0ZXh0LCB3aGF0ID0gJ3NlbnRlbmNlJykpXQoKdXNfY29ycHVzX2R0Wywgd3BzIDo9IG50b2tzL25zZW50XQoKIyBvcmRlciBieSB5ZWFyCnVzX2NfZHRfeWVhciA9IHVzX2NvcnB1c19kdFtvcmRlcihZZWFyKSwgXQoKIyBwbG90CnBsb3QodXNfY19kdF95ZWFyJFllYXIKICAgICAsIHVzX2NfZHRfeWVhciR3cHMKICAgICAsIHhsYWI9J1llYXInCiAgICAgLCB5bGFiPSdXUFMnKQpgYGAKCgoKIyMgVGFzayAyOiBGaXJzdCBzdGVwcyB3aXRoIHJlYWwgZGF0YXNldHMKClVzZSB0aGUgZGF0YSBvZiBzdGF0ZW1lbnRzIG9uIHRydXRoZnVsIGFuZCBkZWNlcHRpdmUgd2Vla2VuZCBwbGFucyB0aGF0IHdhcyB0aGUgYmFzaXMgZm9yIFt0aGlzIHBhcGVyXShodHRwczovL3d3dy5zY2llbmNlZGlyZWN0LmNvbS9zY2llbmNlL2FydGljbGUvcGlpL1MwMDAxNjkxODIwMzA1NzQ2KS4gWW91IGNhbiBmaW5kIHRoZSByYXcgdGV4dHVhbCBkYXRhIG9uIHRoZSBPU0Y6IFtodHRwczovL29zZi5pby9ydHE5eV0oaHR0cHM6Ly9vc2YuaW8vcnRxOXkpLiAKClRoZSBwYXJ0aWNpcGFudHMgd2VyZSBhc2tlZCB0byBlaXRoZXIgdGVsbCB0aGUgdHJ1dGggYWJvdXQgdGhlaXIgcGxhbnMgZm9yIHRoZSB1cGNvbWluZyB3ZWVrZW5kIG9yIHdlcmUgYXNzaWduZWQgYW4gYWN0aXZpdHkgZnJvbSBzb21lb25lIGVsc2UgYW5kIGhhZCB0byBsaWUgYWJvdXQgaXQgKGkuZS4sIGZhYnJpY2F0ZSBhIHN0b3J5KS4KCkVhY2ggcGFydGljaXBhbnQgd2FzIGFza2VkIHR3byBwcm92aWRlIHR3byBzdGF0ZW1lbnRzICgxLiBQbGVhc2Ugd3JpdGUgYWJvdXQgeW91ciB3ZWVrZW5kIHBsYW5zIGluIGFzIG11Y2ggZGV0YWlsIGFzIHBvc3NpYmxlLjsgMi4gV2hpY2ggaW5mb3JtYXRpb24gY291bGQgcHJvdmUgdGhhdCB5b3UgYXJlIHRlbGxpbmcgdGhlIHRydXRoPykuIEZvY3VzIG9uIHRoZSBmaXJzdCBxdWVzdGlvbiAoY2FsbGVkIGBxMWAgaW4gdGhlIGRhdGFzZXQpLgoKVGhlIHZhcmlhYmxlIGBvdXRjb21lX2NsYXNzYCBpcyBlaXRoZXIgYHRgICh0cnV0aGZ1bCkgb3IgYGRgIChkZWNlcHRpdmUpLgoKIyMjIEV4ZXJjaXNlIDIuMSAKCioqV2hhdCBpcyB0aGUgZWZmZWN0IHNpemUgKENvaGVuJ3MgZCkgZm9yIHRoZSBkaWZmZXJlbmNlIGluIHdvcmRzIHBlciBzZW50ZW5jZSBiZXR3ZWVuIHRydXRoZnVsIGFuZCBkZWNlcHRpdmUgc3RhdGVtZW50cz8qKgoKYGBge3J9CiMgbG9hZGluZyB0aGUgZGF0YSAoaGVyZTogYWxsIGluIHRoZSBkYXRhLnRhYmxlIGZsb3cpCgpleF9kYXRhID0gZnJlYWQoJy9Vc2Vycy9iZW5uZXR0a2xlaW5iZXJnL0dpdEh1Yi9zbmxwL2RhdGEvc2lnbl9ldmVudHNfZGF0YV9zdGF0ZW1lbnRzLmNzdicpCgpuYW1lcyhleF9kYXRhKQoKIyBhZGRpbmcgdGhlIGNvbHVtbnMKZXhfZGF0YVssIG50b2tzIDo9IG50b2tlbihxdWFudGVkYTo6dG9rZW5zKHExLCB3aGF0ID0gJ3dvcmQnKSldCmV4X2RhdGFbLCBuc2VudCA6PSBudG9rZW4ocXVhbnRlZGE6OnRva2VucyhxMSwgd2hhdCA9ICdzZW50ZW5jZScpKV0KCmV4X2RhdGFbLCB3cHMgOj0gbnRva3MvbnNlbnRdCgoKIyBvYnRhaW5pbmcgdGhlIGVmZmVjdCBzaXplIGZvciB0aGUgd3BzIH4gb3V0Y29tZV9jbGFzcwojIyBIZXJlIHVzaW5nIHRoZSBlZmZlY3RzaXplIHBhY2thZ2UKbGlicmFyeShlZmZlY3RzaXplKQpjb2hlbnNfZChkYXRhID0gZXhfZGF0YQogICAgICAgICAsIHdwcyB+IG91dGNvbWVfY2xhc3MsIGNpID0gLjk1KQoKIyMgRGVzY3JpcHRpdmVzIG9mIHRoYXQgZWZmZWN0IHdpdGggZGF0YS50YWJsZQpleF9kYXRhWywgLignTScgPSBtZWFuKHdwcykKICAgICAgICAgICAgLCAnU0QnID0gc2Qod3BzKSkKICAgICAgICAsIGJ5ID0gLihvdXRjb21lX2NsYXNzKV0KYGBgCgojIyBUYXNrIDM6IFJlcGxpY2F0aW5nIFppcGYncyBMYXcKCkEgY3VyaW91cyAibGF3IiBpbiBjb3JwdXMgbGluZ3Vpc3RpY3MgaXMgWmlwZidzIExhdyAoW1lvdVR1YmUgaGVyZV0oaHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1mQ244enM5MTJPRSkpLgoKWmlwZidzIExhdyBkZXNjcmliZXMgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSBmcmVxdWVuY3kgb2Ygd29yZHMgaW4gYSBsYW5ndWFnZSBhbmQgdGhlaXIgcmFuayBpbiBhIGZyZXF1ZW5jeS1zb3J0ZWQgbGlzdDogdGhlIGZyZXF1ZW5jeSBvZiBhbnkgd29yZCBpcyBpbnZlcnNlbHkgcHJvcG9ydGlvbmFsIHRvIGl0cyByYW5rIGluIHRoZSBmcmVxdWVuY3kgdGFibGUuCgoKS2V5IGFzcGVjdHMgb2YgWmlwZidzIExhdzoKCi0gV29yZCBmcmVxdWVuY3kgZGlzdHJpYnV0aW9uOiBJbiBhIGxhcmdlIGVub3VnaCBjb2xsZWN0aW9uIG9mIHRleHRzLCB0aGUgbW9zdCBjb21tb24gd29yZCBvY2N1cnMgYWJvdXQgdHdpY2UgYXMgb2Z0ZW4gYXMgdGhlIHNlY29uZCBtb3N0IGZyZXF1ZW50IHdvcmQsIHRocmVlIHRpbWVzIGFzIG9mdGVuIGFzIHRoZSB0aGlyZCBtb3N0IGZyZXF1ZW50IHdvcmQsIGV0Yy4KLSBNYXRoZW1hdGljYWwgZm9ybXVsYXRpb246IFRoZSBsYXcgY2FuIGJlIGV4cHJlc3NlZCBhcyAkZih3KSBcYXBwcm94IFxmcmFjezF9e3J9JCwgd2hlcmUgJGYodykkIGlzIHRoZSBmcmVxdWVuY3kgb2Ygd29yZCAkdyQgYW5kIHIgaXMgdGhlIHJhbmsgb2YgdGhlIHdvcmQuCi0gVW5pdmVyc2FsaXR5OiBUaGlzIGRpc3RyaWJ1dGlvbiBpcyBvYnNlcnZlZCBhY3Jvc3MgdmFyaW91cyBsYW5ndWFnZXMsIGluY2x1ZGluZyBjaGlsZHJlbidzIHNwZWVjaCBhbmQgc3BlY2lhbGl6ZWQgdm9jYWJ1bGFyaWVzLgoKCiMjIyBFeGVyY2lzZSAzLjEKClRoZSBkYXRhc2V0IHdlIHdpbGwgdXNlIGZvciB0aGlzIGV4ZXJjaXNlIHN0ZW1zIGZyb20gd29yayBhIFtwYXBlciBvbiBhbmFseXNpbmcgbmFycmF0aXZlIHNoYXBlcyBpbiBZb3VUdWJlIHZsb2cgdHJhbnNjcmlwdHNdKGh0dHBzOi8vYWNsYW50aG9sb2d5Lm9yZy9EMTgtMTM5NC8pLiBJbiB0aGF0IHBhcGVyLCB0aGUgdmlkZW8gdHJhbnNjcmlwdHMgb2YgMzBrIHZsb2dzIHdlcmUgYW5hbHlzZWQuIFRoZSBkYXRhc2V0IGNhbiBiZSBsb2FkZWQgYXMgZm9sbG93czoKCmBgYHtyfQpsb2FkKCcvVXNlcnMvYmVubmV0dGtsZWluYmVyZy9HaXRIdWIvc25scC9kYXRhL3Zsb2dzX2NvcnB1cy5SRGF0YScpCgp2bG9nc19jb3JwdXMKYGBgCgoqKkRvZXMgWmlwZidzIExhdyBhcHBseSB0byBhIGNvcnB1cyBvZiBZb3VUdWJlIHZsb2cgdHJhbnNjcmlwdHM/KioKCl9IaW50OiB5b3Ugd2lsbCBuZWVkIHRvIG9idGFpbiB0aGUgbW9zdCBjb21tb24gd29yZHMgZm9yIHRoaXMgYW5hbHlzaXMgZnJvbSB0aGF0IGNvcnB1cy4gSGF2ZSBhIGxvb2sgYXQgdGhlIFtgdG9wZmVhdHVyZXMoKWAgZnVuY3Rpb25dKGh0dHBzOi8vd3d3LnJkb2N1bWVudGF0aW9uLm9yZy9wYWNrYWdlcy9xdWFudGVkYS92ZXJzaW9ucy8xLjMuMTMvdG9waWNzL3RvcGZlYXR1cmVzKS4gSGVyZSwgcHV0IHlvdXIgdG9rZW5pc2VkIG9iamVjdCBpbnRvIGEgYGRmbWAgKHdlIHdpbGwgbGVhcm4gbW9yZSBhYm91dCB0aGUgZGZtIGluIHRoZSBuZXh0IHBhcnQpLl8KCmBgYHtyfQojIGNyZWF0ZSBhIGNvcnB1cyBvYmplY3QKY192bG9nc19jb3JwdXMgPSBjb3JwdXModmxvZ3NfY29ycHVzKQoKIyBub3RlIGhvdyB5b3UgYXV0b21hdGljYWxseSByZXRhaW4gdGhlIGRvY3VtZW50LWxldmVsIHZhcmlhYmxlcwpkb2N2YXJzKGNfdmxvZ3NfY29ycHVzKQoKIyB0b2tlbmlzZSB0aGUgY29ycHVzCnRva3NfYyA9IHF1YW50ZWRhOjp0b2tlbnMoY192bG9nc19jb3JwdXMpCgojIHJlbWVtYmVyIHRoYXQgWmlwZidzIExhdyBzdGF0ZXMgdGhhdCB0aGUgZnJlcXVlbmN5IG9mIGEgd29yZCBpcyBpbnZlcnNlbHkgcHJvcG9ydGlvbmFsIHRvIHRoYXQgd29yZCdzIHJhbmsKIyMgd2UgZmlyc3Qgb2J0YWluIHRoZSBtb3N0IGNvbW1vbiB3b3JkcwoKdG9wXzEwMCA9IHRvcGZlYXR1cmVzKGRmbSh0b2tzX2MpLCBuID0gMTAwKQoKIyMgd2UgY2FuIHRoZW4gY3JlYXRlICJwcmVkaWN0aW9ucyIgYnkgWmlwZidzIExhdwp6aXBmX3ByZWQgPSAxLzE6MTAwCgojIyB0aGUgcHJlZGljdGlvbiAoaWYgWmlwZidzIExhdyB3b3VsZCBiZSBhdCBwbGF5KSB3b3VsZCBiZQpwbG90KHggPSAxOjEwMAogICAgICwgeSA9IHppcGZfcHJlZAogICAgICwgeGxhYiA9ICdPYnNlcnZlZCByYW5rJwogICAgICwgeWxhYiA9ICdQcmVkLiBhY2MuIHRvIFppcGYnCiAgICAgLCB0eXBlPSdsJwogICAgICwgY29sID0gJ2JsdWUnKQoKCiMjIG9uIG91ciBkYXRhCnBsb3QoeCA9IDE6MTAwCiAgICAgLCB5ID0gdG9wXzEwMAogICAgICwgeGxhYj0nT2JzZXJ2ZWQgcmFuaycKICAgICAsIHlsYWI9J0ZyZXF1ZW5jeScKICAgICAsIHR5cGU9J2wnCiAgICAgLCBjb2wgPSAncmVkJykKCmBgYAoKIyMjIEV4ZXJjaXNlIDMuMiAKCioqSG93IGRvIHRoZSB3b3JkIGZyZXF1ZW5jeSByYW5rcyBpbiB0aGUgdmxvZ3MgY29ycHVzIGRldmlhdGUgZnJvbSBHb29nbGUncyAxIFRyaWxsaW9uIFdvcmQgQ29ycHVzIGZyZXF1ZW5jeSByYW5rcz8qKgoKWW91IGNhbiBmaW5kIGEgcmFua2VkIGxpc3Qgb2Ygd29yZCBmcmVxdWVuY2llcyBmcm9tIGZyb20gR29vZ2xlJ3MgVHJpbGxpb24gV29yZCBDb3JwdXMgYXQ6IFtodHRwczovL2dpdGh1Yi5jb20vZmlyc3QyMGhvdXJzL2dvb2dsZS0xMDAwMC1lbmdsaXNoXShodHRwczovL2dpdGh1Yi5jb20vZmlyc3QyMGhvdXJzL2dvb2dsZS0xMDAwMC1lbmdsaXNoKS4gSXQgaXMgYWxzbyBwcm92aWRlZCBpbiB0aGUgYGRhdGFgIGRpcmVjdG9yeSBvZiB0aGlzIHJlcG8gKGAuL2RhdGEvZ29vZ2xlXzEwa19saXN0LnR4dGApLiBUaGVzZSBkYXRhIGFyZSBhbHJlYWR5IGluIHJhbmtlZCBvcmRlcjsgdGhlIGZpbGUgZG9lcyBub3QgY29udGFpbiBhIGhlYWRlciAoc28gc2V0OiBgaGVhZGVyPUZgKS4KCmBgYHtyfQojIGxvYWQgdGhlIGdvb2dsZSBmcmVxdWVuY2llcwpnb29nbGVfcmFua3MgPSBmcmVhZCgnL1VzZXJzL2Jlbm5ldHRrbGVpbmJlcmcvR2l0SHViL3NubHAvZGF0YS9nb29nbGVfMTBrX2xpc3QudHh0JwogICAgICAgICAgICAgICAgICAgICAgLCBoZWFkZXI9RikKCiMgYXNzaWduIHJhbmsgdmFyaWFibGUKZ29vZ2xlX3JhbmtzWywgcmFuayA6PSAxOi5OXQoKIyByZW5hbWUgdmFyaWFibGUKbmFtZXMoZ29vZ2xlX3JhbmtzKVsxXSA9ICd3b3JkX2dvb2dsZScKCiMgc2VsZWN0IG9ubHkgdG9wIDEwMApnb29nbGVfdG9wXzEwMCA9IGdvb2dsZV9yYW5rc1sxOjEwMCwgXQoKIyBnZXQgdG9wIDEwMCBmcm9tIHZsb2dzIHRvIGRhdGEudGFibGUgLyBkYXRhLmZyYW1lCnRvcF8xMDBfZHQgPSBzZXREVChkYXRhLmZyYW1lKHdvcmRfdmxvZyA9IG5hbWVzKHRvcF8xMDApCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICwgcmFuayA9IDE6MTAwKSkKCiMgbWVyZ2UgYm90aApyYW5rc19tZXJnZWQgPSBtZXJnZShnb29nbGVfdG9wXzEwMCwgdG9wXzEwMF9kdCwgYnk9J3JhbmsnKQpyYW5rc19tZXJnZWQKYGBgCgotLS0=